Project - Aritificial Neural Networks - Part 1 (Classification)

CONTEXT:

A communications equipment manufacturing company has a product which is responsible for emitting informative signals. Company wants to build a machine learning model which can help the company to predict the equipment’s signal quality using various parameters.

DATA DESCRIPTION:

The data set contains information on various signal tests performed:

  1. Parameters: Various measurable signal parameters.
  2. Signal_Quality: Final signal strength or quality

PROJECT OBJECTIVE:

The need is to build a classifier which can use these parameters to determine the signal strength or quality as number.

STRATEGY:

  1. Use a multi stage strategy to tune hyperparameters of the ANN using an opensource search automation package. We have chosen Optuna for this purpose.
  2. Develop a web app playground to test different combination of values for hyperparameters and visulise the training and testing process using a Liveplot.

(1) Import all Python Libraries

(2) Data loading and verification

Observation:

  1. Parameter 1 to Parameter 11 are independent variables.
  2. Signal Strength is the independent variable.

(2.1) Data Description

Observations:

  1. Variables Parameter 1 to Parameter 11 are of data type float.
  2. Signal Strength is of datatype integer

Observations:

  1. Parameter 8 has a very short range of 0.001 and may not add significant variance to the overall data set.
  2. Similarly Parameter 5 has a very small interquartile range and with some significant outliers to the right of the curve.
  3. Signal Strength is a discrete integer between 3 to 8.

(2.2) Data Verification

Observations:

  1. There are no NAN or NULL in the given dataframe. No additional cleansing required.

(3) EDA ( Exploratory Data Analysis)

(3a) - Univariate Analysis

Observations:

  1. Parameter 5 and Parameter 8 have very short IQR.
  2. If Outliers are eliminated, then the overall variance in the data would reduce.
  3. It might be a good to consisder refactoring these variables as a ratio or into a higher degree exponent in the feature engineering.

(3b) Multivariate Analysis

Observations:

1.There are significant correlations between most of the variables. 2.Most significant correlations are between Parameter 1 and Parameter 3, Parameter 1 and Parameter 8, Parameter 6 and Parameter 7 at almost 0.67. 3.Signal Strength is highly correlated with Parameter 10 and Parameter 11

(3c) Variance and Multicollinearity

Observations:

  1. Parameter 1, Parameter 4, Parameter 6, Parameter 7 and Parameter 11 are the top 5 variables showing high variance in the data.
  2. Parameter 8 shows nearly zero variance with Parameter 5 the next to least. These parameters could be discarded during the modelling

Observations:

  1. Parameter 8, 9, 11,1,10 and 2 have very high VIFs i.e multi-collienarity.
  2. We will attempt at reducing this during feature engineering.

4) Creating Helper Classes for Model creation and Live plotting

(4a) Class to create and customize DNNs.

This is a generic helper class to create N layered ANN with a flexibility to place BatchNormalization, Activation and Dropout layers.

(4b) Callback Class for Live plotting during training

This is a helper class to Live Plot the training progress during modelling

5 Feature Engineering and Feature Selection

(5a) Pre Modelling Baseline

Baselining strategy:

  1. We will create a baseline model to assess the performance with just the scaled data, with feature engineering and with feature selection. 2.This baseline model/data will then be subjected to different levels of hyperparameter optimization.
  2. To baseline the model, we will assume some basic hyperparameters as show below.
  3. Loss and Performance metric for regression problem will spare_categorical_crossentropy and accuracy respectively.
1> Without Feature Engineering

Observation:

  1. An ANN model is created with batchNormalization, Activation and Dropout layers in that order.
  2. The model uses the defaults used in the helper class for ANN model creation
2> Feature Engineering
We will change the original variables into quadratic variables `(degree 2)` and drop `Parameter 8` as it's very close to `1` and increasing the power will not create any significant variance.
Observation:
  1. We see that the VIF is significantly reduced from the original value.
Observation:
  1. We can see that there is no significant difference in accuracy between scaled and engineered data. However, the loss is higher for the engineered data.
3> Feature Selection
Observation:
  1. The model based feature selection has returned 3 variables Parameter 11, 7 and 10. This has siginificantly reduced the feature space.
  2. Suspect that this might reduce the performance of the model.

Observation:

  1. Surprisingly the loss and performance is better than the previous 2 experiments as shown in the dashboard. This means that the 3 out of 11 parameters are sufficient with a descent performance.
Observations:

Based on the score board above, accuracy is the though marginal is highest for the data with feature selection and hence we will use this as a baseline for our hyper parameter tuning and model training

6 Hyperparameter Tuning using Optuna.

https://optuna.readthedocs.io/en/stable/

Hyperparameter tuning strategy:

  1. Architecture Selection : Find the optimal number of layers, number of neurons per layer, optimal combination of Activation, BatchNormalization and Dropout.
  2. Coarse Tuning : Find the optimal weight initialization parameters, activation type and drop rate at each layer.
  3. Fine Tuning : Find the optimal optimizer and their parameters.
  4. Final Tuning : Find the optimal number of epochs and batch size

At each stage the parameter determined in the previous stage flows into the next one to override defaults

(6.1) Architecture Selection by hyperparameter tuning

Observation:

  1. The best recorded loss is 0.66 which is higher than that using defaults. But this requires 6 hidden layers with neurons as listed above.

Observations:

  1. The parallel coordinate plot shows different combination of values taken to reach the best objective value of 0.66

(6.2) Coarse tuning by hyper parameter tuning

Observation:

  1. The coarse tuning has resulted in a significant improvement of 3% in accuracy with relu as the activation function and lecunn_uniform weight initialiser.

(6.3) Fine tuning - Hunt for optimizer

Observations:

  1. The fine tuning has resulted in 0.69 which is an improvement of 1% with Adam optimizer and learning rate of 0.0006.

(6.4) Final tuning by hyper parameter tuning

Observation:

  1. The best performance at this stage is 0.688 i.e decreased by 0.3% at 500 epochs and batch size of 88. Hence we may not pick these values and use default values of 10 and 100 for batch_size and epochs respectively

Manually overriding batch_size and epoch through random experimentation

(6.5) Consolidating the hyperparameter list to be used in GUI

(7) Conclusion:

  1. An ANN model was balined against scaled, feature engineered and feature selected data. It was found that the baseline against the Model Based Feature Selection was better than the other two.
  2. Sparse Categorical Entropy is used as a loss function and Accuracy as the performance metric in training and testing the ANN.
  3. The entire dataset was split into Train and Test. validation data was split from the training data at the time of model fitting.
  4. A multi stage strategy was applied in tuning hyper parameters of the ANN.

    Stage 1: Architecture Selection - Accuracy of 66%

    Stage 2: Coarse tuning - Accuracy of 68%

    Stage 3 : Fine tuning - Accuracy of 69%

    Stage 4 : Final tuning - Accuracy of ~69%

    However, all hyperparameters prior to final tuning were retained and manually tweaked for stage 4 parameters.

  5. A web based application based on Streamlit an opensource is created to test the hyperparameters and to save the model. You can try the same @ http://34.70.58.49:8085/. Please make sure to upload the data file and copy paste the hyperparameters above in the respective text input area.